29 research outputs found
End-to-End Learning of Representations for Asynchronous Event-Based Data
Event cameras are vision sensors that record asynchronous streams of
per-pixel brightness changes, referred to as "events". They have appealing
advantages over frame-based cameras for computer vision, including high
temporal resolution, high dynamic range, and no motion blur. Due to the sparse,
non-uniform spatiotemporal layout of the event signal, pattern recognition
algorithms typically aggregate events into a grid-based representation and
subsequently process it by a standard vision pipeline, e.g., Convolutional
Neural Network (CNN). In this work, we introduce a general framework to convert
event streams into grid-based representations through a sequence of
differentiable operations. Our framework comes with two main advantages: (i)
allows learning the input event representation together with the task dedicated
network in an end to end manner, and (ii) lays out a taxonomy that unifies the
majority of extant event representations in the literature and identifies novel
ones. Empirically, we show that our approach to learning the event
representation end-to-end yields an improvement of approximately 12% on optical
flow estimation and object recognition over state-of-the-art methods.Comment: To appear at ICCV 201
Learning Depth With Very Sparse Supervision
Motivated by the astonishing capabilities of natural intelligent agents and
inspired by theories from psychology, this paper explores the idea that
perception gets coupled to 3D properties of the world via interaction with the
environment. Existing works for depth estimation require either massive amounts
of annotated training data or some form of hard-coded geometrical constraint.
This paper explores a new approach to learning depth perception requiring
neither of those. Specifically, we train a specialized global-local network
architecture with what would be available to a robot interacting with the
environment: from extremely sparse depth measurements down to even a single
pixel per image. From a pair of consecutive images, our proposed network
outputs a latent representation of the observer's motion between the images and
a dense depth map. Experiments on several datasets show that, when ground truth
is available even for just one of the image pixels, the proposed network can
learn monocular dense depth estimation up to 22.5% more accurately than
state-of-the-art approaches. We believe that this work, despite its scientific
interest, lays the foundations to learn depth from extremely sparse
supervision, which can be valuable to all robotic systems acting under severe
bandwidth or sensing constraints.Comment: Accepted for Publication at the IEEE Robotics and Automation Letters
(RA-L) 2020, and International Conference on Intelligent Robots and Systems
(IROS) 202
Learning Visual Locomotion with Cross-Modal Supervision
In this work, we show how to learn a visual walking policy that only uses a
monocular RGB camera and proprioception. Since simulating RGB is hard, we
necessarily have to learn vision in the real world. We start with a blind
walking policy trained in simulation. This policy can traverse some terrains in
the real world but often struggles since it lacks knowledge of the upcoming
geometry. This can be resolved with the use of vision. We train a visual module
in the real world to predict the upcoming terrain with our proposed algorithm
Cross-Modal Supervision (CMS). CMS uses time-shifted proprioception to
supervise vision and allows the policy to continually improve with more
real-world experience. We evaluate our vision-based walking policy over a
diverse set of terrains including stairs (up to 19cm high), slippery slopes
(inclination of 35 degrees), curbs and tall steps (up to 20cm), and complex
discrete terrains. We achieve this performance with less than 30 minutes of
real-world data. Finally, we show that our policy can adapt to shifts in the
visual field with a limited amount of real-world experience. Video results and
code at https://antonilo.github.io/vision_locomotion/.Comment: Learning to walk from pixels in the real world by using
proprioception as supervision. Project page for videos and code:
https://antonilo.github.io/vision_locomotion
Event-based Vision meets Deep Learning on Steering Prediction for Self-driving Cars
Event cameras are bio-inspired vision sensors that naturally capture the
dynamics of a scene, filtering out redundant information. This paper presents a
deep neural network approach that unlocks the potential of event cameras on a
challenging motion-estimation task: prediction of a vehicle's steering angle.
To make the best out of this sensor-algorithm combination, we adapt
state-of-the-art convolutional architectures to the output of event sensors and
extensively evaluate the performance of our approach on a publicly available
large scale event-camera dataset (~1000 km). We present qualitative and
quantitative explanations of why event cameras allow robust steering prediction
even in cases where traditional cameras fail, e.g. challenging illumination
conditions and fast motion. Finally, we demonstrate the advantages of
leveraging transfer learning from traditional to event-based vision, and show
that our approach outperforms state-of-the-art algorithms based on standard
cameras.Comment: 9 pages, 8 figures, 6 tables. Video: https://youtu.be/_r_bsjkJTH
A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones
Fully-autonomous miniaturized robots (e.g., drones), with artificial
intelligence (AI) based visual navigation capabilities are extremely
challenging drivers of Internet-of-Things edge intelligence capabilities.
Visual navigation based on AI approaches, such as deep neural networks (DNNs)
are becoming pervasive for standard-size drones, but are considered out of
reach for nanodrones with size of a few cm. In this work, we
present the first (to the best of our knowledge) demonstration of a navigation
engine for autonomous nano-drones capable of closed-loop end-to-end DNN-based
visual navigation. To achieve this goal we developed a complete methodology for
parallel execution of complex DNNs directly on-bard of resource-constrained
milliwatt-scale nodes. Our system is based on GAP8, a novel parallel
ultra-low-power computing platform, and a 27 g commercial, open-source
CrazyFlie 2.0 nano-quadrotor. As part of our general methodology we discuss the
software mapping techniques that enable the state-of-the-art deep convolutional
neural network presented in [1] to be fully executed on-board within a strict 6
fps real-time constraint with no compromise in terms of flight results, while
all processing is done with only 64 mW on average. Our navigation engine is
flexible and can be used to span a wide performance range: at its peak
performance corner it achieves 18 fps while still consuming on average just
3.5% of the power envelope of the deployed nano-aircraft.Comment: 15 pages, 13 figures, 5 tables, 2 listings, accepted for publication
in the IEEE Internet of Things Journal (IEEE IOTJ
A General Framework for Uncertainty Estimation in Deep Learning
Neural networks predictions are unreliable when the input sample is out of
the training distribution or corrupted by noise. Being able to detect such
failures automatically is fundamental to integrate deep learning algorithms
into robotics. Current approaches for uncertainty estimation of neural networks
require changes to the network and optimization process, typically ignore prior
knowledge about the data, and tend to make over-simplifying assumptions which
underestimate uncertainty. To address these limitations, we propose a novel
framework for uncertainty estimation. Based on Bayesian belief networks and
Monte-Carlo sampling, our framework not only fully models the different sources
of prediction uncertainty, but also incorporates prior data information, e.g.
sensor noise. We show theoretically that this gives us the ability to capture
uncertainty better than existing methods. In addition, our framework has
several desirable properties: (i) it is agnostic to the network architecture
and task; (ii) it does not require changes in the optimization process; (iii)
it can be applied to already trained architectures. We thoroughly validate the
proposed framework through extensive experiments on both computer vision and
control tasks, where we outperform previous methods by up to 23% in accuracy.Comment: Accepted for publication in the Robotics and Automation Letters 2020,
and for presentation at the International Conference on Robotics and
Automation (ICRA) 202
Deep Drone Racing: From Simulation to Reality with Domain Randomization
Dynamically changing environments, unreliable state estimation, and operation
under severe resource constraints are fundamental challenges that limit the
deployment of small autonomous drones. We address these challenges in the
context of autonomous, vision-based drone racing in dynamic environments. A
racing drone must traverse a track with possibly moving gates at high speed. We
enable this functionality by combining the performance of a state-of-the-art
planning and control system with the perceptual awareness of a convolutional
neural network (CNN). The resulting modular system is both platform- and
domain-independent: it is trained in simulation and deployed on a physical
quadrotor without any fine-tuning. The abundance of simulated data, generated
via domain randomization, makes our system robust to changes of illumination
and gate appearance. To the best of our knowledge, our approach is the first to
demonstrate zero-shot sim-to-real transfer on the task of agile drone flight.
We extensively test the precision and robustness of our system, both in
simulation and on a physical platform, and show significant improvements over
the state of the art.Comment: Accepted as a Regular Paper to the IEEE Transactions on Robotics
Journal. arXiv admin note: substantial text overlap with arXiv:1806.0854
AutoTune: Controller Tuning for High-Speed Flight
Due to noisy actuation and external disturbances, tuning controllers for
high-speed flight is very challenging. In this paper, we ask the following
questions: How sensitive are controllers to tuning when tracking high-speed
maneuvers? What algorithms can we use to automatically tune them? To answer the
first question, we study the relationship between parameters and performance
and find out that the faster the maneuver, the more sensitive a controller
becomes to its parameters. To answer the second question, we review existing
methods for controller tuning and discover that prior works often perform
poorly on the task of high-speed flight. Therefore, we propose AutoTune, a
sampling-based tuning algorithm specifically tailored to high-speed flight. In
contrast to previous work, our algorithm does not assume any prior knowledge of
the drone or its optimization function and can deal with the multi-modal
characteristics of the parameters' optimization space. We thoroughly evaluate
AutoTune both in simulation and in the physical world. In our experiments, we
outperform existing tuning algorithms by up to 90\% in trajectory completion.
The resulting controllers are tested in the AirSim Game of Drones competition,
where we outperform the winner by up to 25\% in lap-time. Finally, we show that
AutoTune improves tracking error when flying a physical platform with respect
to parameters tuned by a human expert.Comment: Video: https://youtu.be/m2q_y7C01So; Code:
https://github.com/uzh-rpg/mh_autotun
Flightmare: A Flexible Quadrotor Simulator
Currently available quadrotor simulators have a rigid and highly-specialized
structure: either are they really fast, physically accurate, or
photo-realistic. In this work, we propose a paradigm-shift in the development
of simulators: moving the trade-off between accuracy and speed from the
developers to the end-users. We use this design idea to develop a novel modular
quadrotor simulator: Flightmare. Flightmare is composed of two main components:
a configurable rendering engine built on Unity and a flexible physics engine
for dynamics simulation. Those two components are totally decoupled and can run
independently from each other. This makes our simulator extremely fast:
rendering achieves speeds of up to 230 Hz, while physics simulation of up to
200,000 Hz. In addition, Flightmare comes with several desirable features: (i)
a large multi-modal sensor suite, including an interface to extract the 3D
point-cloud of the scene; (ii) an API for reinforcement learning which can
simulate hundreds of quadrotors in parallel; and (iii) an integration with a
virtual-reality headset for interaction with the simulated environment. We
demonstrate the flexibility of Flightmare by using it for two completely
different robotic tasks: learning a sensorimotor control policy for a quadrotor
and path-planning in a complex 3D environment
Conformal Policy Learning for Sensorimotor Control Under Distribution Shifts
This paper focuses on the problem of detecting and reacting to changes in the
distribution of a sensorimotor controller's observables. The key idea is the
design of switching policies that can take conformal quantiles as input, which
we define as conformal policy learning, that allows robots to detect
distribution shifts with formal statistical guarantees. We show how to design
such policies by using conformal quantiles to switch between base policies with
different characteristics, e.g. safety or speed, or directly augmenting a
policy observation with a quantile and training it with reinforcement learning.
Theoretically, we show that such policies achieve the formal convergence
guarantees in finite time. In addition, we thoroughly evaluate their advantages
and limitations on two compelling use cases: simulated autonomous driving and
active perception with a physical quadruped. Empirical results demonstrate that
our approach outperforms five baselines. It is also the simplest of the
baseline strategies besides one ablation. Being easy to use, flexible, and with
formal guarantees, our work demonstrates how conformal prediction can be an
effective tool for sensorimotor learning under uncertainty.Comment: Conformal Policy Learnin